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Abstract 

There are lots of studies attempting to identify the ex- 
pression changes in oral squamous cell carcinoma. Most 
studies include insufficient samples to apply statistical 
methods for detecting significant gene sets. This study 
combined two small microarray datasets from a public 
database and identified significant genes associated 
with the progress of oral squamous cell carcinoma. 
There were different expression scales between the two 
datasets, even though these datasets were generated 
under the same platforms - Affymetrix U133A gene 
chips. We discretized gene expressions of the two data- 
sets by adjusting the differences between the datasets 
for detecting the more reliable information. From the 
combination of the two datasets, we detected 51 signifi- 
cant genes that were upregulated in oral squamous cell 
carcinoma. Most of them were published in previous 
studies as cancer-related genes. From these selected 
genes, significant genetic pathways associated with ex- 
pression changes were identified. By combining several 
datasets from the public database, sufficient samples 
can be obtained for detecting reliable information. Most 
of the selected genes were known as cancer-related 
genes, including oral squamous cell carcinoma. Several 
unknown genes can be biologically evaluated in further 
studies. 

Keywords: combined dataset, genetic pathway, oral 
squamous cell carcinoma, public microarray database, 
significant gene 

Introduction 

Despite recent advances in surgical, radiation, and che- 
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motherapeutic treatment protocols, the prognosis of oral 
squamous cell carcinoma (OSCC) remains mournful, with 
an approximate 50% 5-year mortality rate from disease 
or associated complications [1], Therefore, the identi- 
fication of biological markers is essential to make prog- 
ress in detecting malignancy at an early stage and de- 
veloping novel therapies [2]. 

Microarray datasets that are created for the same re- 
search purposes in different laboratories have accumu- 
lated rapidly. The results from different datasets are of- 
ten inconsistent due to the utilization of different plat- 
forms, sample preparations, or various technical varia- 
tions. If we could combine such datasets by adjusting 
for systematic biases that exist among different datasets 
derived from different experimental conditions, the pow- 
er of statistical tests would be improved by the increase 
in sample size [3]. 

In OSCC, although lots of microarray-based studies 
have been conducted to provide insights into gene ex- 
pression changes, most of these studies have contained 
insufficient samples for detecting reliable information us- 
ing statistical analysis [4, 5]. Therefore, this study at- 
tempted to combine several datasets in the public data- 
base for detecting significant genes. 

We used two small microarray datasets of OSCC for 
this study, which were based on the same platform but 
had different expression scales. These two datasets 
were combined after discretization, because a previous 
study showed that classification could be improved us- 
ing combined datasets after discretization [3]. After 
combining datasets, we used chi-square test for identi- 
fying the significant genes. Chi-square test has been 
used commonly to detect differentially expressed genes 
after discretization of expression intensities in the micro- 
array experiment. 

In this study, gene expression ratios of two datasets 
were transformed with their ranks for each dataset. 
Next, the transformed datasets were combined, and a 
nonparametric statistical method was applied to the 
combined dataset to detect informative genes. Finally, 
we showed that most of the selected genes were 
known to be involved in various cancers, including 
OSCC. 
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Table 1. Summaryof two microarray datasets from GEO and the combined dataset 



Data name 




Experimental platform 


No. of genes 


No. of total 
samples 


Normal 


group 


Tumor group 


Data 2004 [4] 




Affymetrix U133A 


14,119 


20 




4 






16 


Data 2005 [5] 




Affymetrix U133A 


22,283 


27 




5 






22 


Combined dataset 






14,119 


47 




9 






38 


GEO, Gene Expression Omnibus. 


















Table 2. Combination of contingency tables for three datasets (t f = 


a,] + bij + q) 














Dataset A 






Dataset B 


Dataset C 






Combined 


dataset 


P1 


P2 


P3 


P1 


P2 P3 


P1 P2 


P3 




P1 


P2 


P3 


E1 an 


ai2 


ai3 


bn 


b 12 b 13 


C11 c 12 


C13 




tn 


tl2 


tl3 


E2 a 2i 


a 22 


a 2 3 


+ b 2 i 


b 22 b 2 3 + 


c 21 c 22 


C 2 3 




t 2 i 


t 22 


t 2 3 


E3 a 3 . 


a32 


a33 


b 3 i 


b3 2 b33 


C31 C3 2 


C33 




t 3 i 


t32 


t33 



P1, P2, and P3 represent the three different phenotypes. E1, E2, and E3 represent three groups by rank of gene expressions, aij, by, and 
Cij are the numbers of experiments belonging to Pj and Ei at the same time in data A, data B, and data C, respectively. 



Methods 

Dataset 

Two microarray datasets were used for this study. We 
acquired these datasets from a public database (Gene 
Expression Omnibus, GEO). One was the expression da- 
taset of 16 tumors and 4 normal tissues from 16 pa- 
tients, using Affymetrix U133A gene chips (Affymetrix, 
Santa Clara, CA, USA). The other microarray dataset 
consisted of expression profiles of 22 tumors and 5 nor- 
mal tissues. These two datasets were experimented on 
under the same platform, Affymetrix U133A. The data- 
sets are summarized in Table 1. 

Process for combining datasets 

For combining datasets, gene expression ratios are re- 
arranged in order of expression ratios by each gene in 
each dataset, and the ranks are matched with the cor- 
responding experimental group. If the experimental 
groups are homogenous, the ranks within the same ex- 
perimental group would be neighboring. The process of 
discretization of gene expressions is summarized in the 
following steps [3]: 

(1) Rank the gene expression ratios within a gene for 
each dataset. 

(2) List in order of the ranks, and assign the order of 
gene expressions to the corresponding experimen- 
tal groups. 

(3) Summarize the result of (2) in the form of a con- 
tingency table for each gene. 

(4) Combine the contingency tables that have been 



Table 3. Summary of discretized data using ranks of gene 
expressions 



Experimental groups by 
phenotypes 







P1 


P2 


P3 


Marginal 
sum 


Experimental group 


E1 


nn 


n 12 


ni3 




by rank of gene 


E2 


n 21 


n 22 


n 2 3 


r 2 


expression 


E3 


n3i 


n3 2 


n33 


r3 


Marginal sum 




C1 


c 2 


c 3 


n 



summarized for each dataset. 
When there are three datasets to be combined, the 
datasets can be added as a single entry, as shown in 
Table 2, after the transformation of each dataset by rank. 

Identification of significant genes from a com- 
bined dataset 

After the summarization of gene expression ratios in the 
form of a contingency table for each gene, as shown in 
Table 3, a nonparametric statistical method was applied 
to the datasets for independence testing between gene 
expression patterns and experimental groups. The test 
statistics are calculated as follows for each gene: 

When the sample size is small - generally E(n$} less 
than 5 - Fisher's exact test is recommended rather than 
chi-square test. 
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The significant genes can be selected by an in- 
dependence test between the phenotypes and gene ex- 
pressions using this type of summarized dataset. c, and 
n represent the marginal sums of the l h column and 
row, respectively. n,j is the number of experiments be- 
longing to Ej and Pj, and n represents the total number 
of experiments. 

Results 

The clinical information and expression levels of two da- 



Table 4. Summary of two microarray datasets 

Data 2004 Data 2005 

Subgroup 

Tumor 16 22 

Normal 4 5 

Sex 

Male 15 21 

Female 5 6 

Age (mean, standard deviation) 56.9 (10.22) 60.03 (14.16) 

Primary site 

Tongue 7 16 

Floor of mouth 9 5 

Other 4 6 
T stage 

T1 1 4 

T2 7 8 

T3 1 4 

T4 9 10 

Missing 2 1 



tasets are summarized in Table 4 and Fig. 1. Subgroup 
and sex were similarly distributed in the two datasets. 
The distributions of other factors were not included. 

The scale of expression levels in the two datasets 
was different; the expression values of Data 2004 
ranged from 0.01 to 740, and those of Data 2005 were 
from 0.1 to 19,773. The expression patterns of the two 
datasets can be explored in Fig. 1. 

Lots of outliers are shown in Fig. 1A in the two data- 
sets containing whole gene sets. However, in subsets of 
significant genes, the expression ranges got narrow, and 
the outliers were decreased (Fig. 1B). The expressions 
of tumor tissues in Data 2004 were upregulated and 
varied compared with normal tissues. If there was no 
outlier with a maximum value in the 14th tumor tissue 
in Data 2004, the expressions of the two different 
groups would be clearly distinguished. Any clear differ- 
ences in expression were not shown between the two 
groups in Data 2005. 



Upregulated 51 genes in oral squamous cell 
cinoma 



car- 



To identify differently expressed genes between normal 
and tumor tissues, we performed chi-square test using 
a combined microarray dataset. Fifty-one significant 
genes were selected from a combined dataset with 
p-value less than 0.005, which were upregulated in 
OSCC tissues. The significance level can be controlled, 
and more genes can be selected with a lower sig- 
nificance level. These selected genes are summarized in 
Table 5. 



Data 2004 



Data 2005 




Tumor 



^ ' " 

Normal 



Tumor 



T T • i T * I • 



* j 

Normal Tumor 




Fig. 1. Comparison of expres- 
sion levels of two datasets. (A) 
Whole gene set. (B) Selected 
gene set. 
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Many genes among the selected genes were known 
as cancer-related genes. STAT1 [6], SKP2 [7], IFI16 [8], 
RHEB [9], FIF44 [10], SOD2 [11, 12], and GREM1 [11] 
are related to OSCC. Table 6 [13-56] summarizes the 



previous studies that have published the relations of se- 
lected genes with cancer. 



Table 5. Summary of selected 51 upregulated genes 



Affymetrix No 


Gene 


Description 


Fold change 


200037_s_at 


CBX3 


Chromobox homolog 3 (hp1 gamma homolog, drosophila) 


2.219978 


200056_s_at 


C1D 


Nuclear dna-binding protein 


2.448721 


200887_s_at 


STAT1 


Signal transducer and activator of transcription 1, 91kda 


4.307249 


201091_s_at 


CBX3 


Chromobox homolog 3 (hp1 gamma homolog, drosophila) 


3.647541 


201486_at 


RCN2 


Reticulocalbin 2, ef-hand calcium binding domain 


2.279745 


201518_at 


CBX1 


Chromobox homolog 1 (hp1 beta homolog drosophila) 


2.132493 


201663_s_at 


SMC4 


Smc4 structural maintenance of chromosomes 4-like 1 (yeast) 


2.434400 


202633_at 


TOPBP1 


Topoisomerase (dna) ii binding protein 1 


2.189444 


203038_at 


PTPRK 


Protein tyrosine phosphatase, receptor type, k 


3.345238 


203301 _s_at 


DMTF1 


Cyclin d binding myb-like transcription factor 1 


1.378319 


203562_at 


FEZ1 


Fasciculation and elongation protein zeta 1 (zygin i) 


2.853794 


203566_s_at 


AGL 


Amylo-1 , 6-glucosidase, 4-alpha-glucanotransferase 


2.114894 


203595_s_at 


IFIT5 


Interferon-induced protein with tetratricopeptide repeats 5 


2.664490 


203625_x_at 


SKP2 


S-phase kinase-associated protein 2 (p45) 


2.007377 


203744_at 


HMGB3 


High-mobility group box 3 


2.974931 


203964_at 


NMI 


N-myc (and stat) interactor 


3.840395 


20421 1_x_at 


EIF2AK2 


Eukaryotic translation initiation factor 2-alpha kinase 2 


1 .994068 


204439_at 


IFI44L 


Interferon-induced protein 44-like 


124.396853 


204822_at 


TTK 


ttk protein kinase 


2.414220 


204825_at 


MELK 


Maternal embryonic leucine zipper kinase 


3.755818 


206765_at 


KCNJ2 


Potassium inwardly-rectifying channel, subfamily j, member 2 


1.810372 


207438_s_at 


SNUPN 


rna, u transporter 1 


1.913825 


208079_s_at 


AURKA 


Aurora kinase a 


3.848891 


208966_x_at 


IFI16 


Interferon, gamma-inducible protein 16 


2.568727 


209095_at 


DLD 


Dihydrolipoamide dehydrogenase 


1.476130 


209524_at 


HDGFRP3 


Hepatoma-derived growth factor, related protein 3 


2.724985 


209903_s_at 


ATR 


Ataxia telangiectasia and rad3-related 


1 .635679 


210283_x_at 


PAIP1 


Poly(a) binding protein interacting protein 1 


1.997611 


211725_s_at 


BID 


bh3 interacting domain death agonist 


3.476190 


211727_s_at 


COX11 


Cox11 homolog, cytochrome c oxidase assembly protein 


1.419895 


212314_at 


KIAA0746 


kiaa0746 protein 


10.323529 


212765_at 


CAMSAP1L1 


Calmodulin-regulated spectrin-associated protein 1 -like 1 


1.717589 


212959_s_at 


GNPTAB 


Hypothetical protein dkfzp762b226 


1 .733743 


213008_at 


FANCI 


kiaa1794 


2.935005 


213104_at 


C160RF42 


Hypothetical protein mgc24381 


2.059115 


213294_at 


CCDC75 


Coiled-coil domain-containing 75 


4.261916 


213404_s_at 


RHEB 


ras homolog enriched in brain 


1 .536225 


213452_at 


ZNF184 


Zinc finger protein 184 (kruppel-like) 


1 .534287 


213679_at 


TTC30A 


Hypothetical protein flj13946 


2.374943 


214453_s_at 


IFI44 


Interferon-induced protein 44 


11.920148 


215223_s_at 


SOD2 


Superoxide dismutase 2, mitochondrial 


4.950142 


215495_s_at 


SAMD4A 


Sterile alpha motif domain containing 4a 


3.204074 


216841_s_at 


SOD2 


Superoxide dismutase 2, mitochondrial 


4.790233 


217901_at 


DSG2 


Desmoglein 2 


5.614525 


218469_at 


GREM1 


Gremlin 1, cysteine knot superfamily, homolog 


3.366686 


218627_at 


DRAM 


Damage-regulated autophagy modulator 


2.780824 


218901_at 


PLSCR4 


Phospholipid scramblase 4 


3.663654 


218986_s_at 


FLJ20035 


Hypothetical protein flj10787 


6.364550 


219087_at 


ASPN 


Asporin (Irr class 1) 


7.895878 


219372_at 


IFT81 


Intraflagellar transport 81 homolog (chlamydomonas) 


1 .875798 


219787_s_at 


ECT2 


Epithelial cell transforming sequence 2 oncogene 


4.242975 
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Table 6. Association of the selected genes and cancer 


Gene 


uoouijiaLiU] i 


References 


a^nHatinn 


References 


Fold change 


CBX3 










2.219978 


C1D 


Yes 


Yang et al. [13] 






2.448721 


STAT1 


Yes 


LJii-ni n/ TCI 

Hiroi et al, [b| 


Yes 


Hiroi et al. [14J 


4.oU/^4a 






Laimer et al. [15] 








RCN2 


Yes 


Cavallo et al. [16] 






2.279745 


CBX1 


Yes 


Luo et al. [17] 






2.132493 


SMC4 










2.434400 


TOPBP1 


Yes 


Going et al. [1 8] 






2.189444 


PTPRK 


Yes 


Starr et al. [19] 






3.345238 






Flavell et al. [20] 








DMTF1 


Yes 


van Dekken et al. [21] 






1.378319 


FEZ1 


Yes 


Califano et al. [22] 






2.853794 






Chen et al. [23] 








AGL 


Yes 


Fabris et al. [24] 






2.114894 


IFIT5 








Ben-lzhak et al. [7] 


2.664490 


SKP2 


Yes 


Shintani et al. [25] 


Yes 




2.007377 


HMGB3 


Yes 


Hayes et al. [26] 






2.974931 


NMI 


Yes 


Fillmore et al. [27] 






3.840395 






Quaye et al. [28] 








EIF2AK2 










1 .994068 


IFI44L 










124.396853 


TTK 


Yes 


Harima et al. [29] 






2.414220 






Kono et al. [30] 












de Career et al. [31] 












Suda et al. [32] 








MELK 


Yes 


Pickard et al. [33] 






3.755818 






Kappadakunnel et al. [34] 








KCNJ2 


Yes 


Gafeza-Kulik et al. [35] 






1.810372 


SNUPN 








1.913825 


AURKA 


Yes 


Torchia et al. [36] 






3.848891 






Chen et al. [37] 












Kaestner et al. [38] 




De Andrea et al. [8] 




IFI16 


Yes 


Alimirah et al. [39] 


Yes 




2.568727 






Zhang et al. [40] 








m Pi 
DLL) 




Ortega-Paino et al. [41] 






1 .4/bl 30 


HDGFRP3 


Yes 








2.724985 


ATR 










1 .635679 


PAIP1 










1 .99761 1 


BID 




Ahmed et al. [42] 






3.476190 




Vac 

Yes 








1 ,4 1 yoyo 


KIAA0746 










10.323529 


UAMoArl LI 










\ J\ /ooa 


GNPTAB 




Zhi et al. [43] 






1 .733743 


FANCI 


Yes 


Barroso et al. [44] 






2.935005 


C160RF42 










2.059115 


CCDC75 








Chakraborty et al. [9] 


4.261916 


RHEB 






Yes 




1 .536225 


ZNF184 










1.534287 


TTC30A 




Lee et al. [45] 




Ye et al. [11] 


2.374943 


IFI44 


Yes 


Skrzycki et al. [46] 


Yes 


Liu et al. [12] 


11.920148 


SOD2 


Yes 


Olson et al. [47] 


Yes 


Ye et al. [10] 


4.950142 


SAMD4A 




Lorch et al. [48] 






4.790233 



OSCC, oral squamous cell carcinoma. 
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Table 6. Continued 



Gene 


Cancer 
association 


References 


OSCC 
association 


References 


Fold change 


DSG2 


Yes 


Lorch et al. [49] 




Ye et al. [11] 


5.614525 


GREM1 




Crighton et al. [50] 


Yes 




3.366686 


DRAM 


Yes 


Crighton et al. [51] 






2.780824 


PLSCR4 










3.663654 


FLJ20035 




Mackay et al. [52] 






6.364550 


ASPN 


Yes 


Turashvili et al. [53] 






7.895878 


IFT81 




Fields and Justilien [54] 






1 .875798 


ECT2 


Yes 


Boelens et al. [55] 
Hirata et al. [56] 






4.242975 



OSCC, oral squamous cell carcinoma. 




Fig. 2. Expression patterns of the selected 51 genes. 
These genes were upregulated in oral squamous cell carci- 
noma tissues, and normal and tumor groups were clearly 
classified with these genes. 

Expression pattern of the identified genes 

To investigate whether the different experimental groups 
could be classified with significant genes, an unsuper- 
vised hierarchical clustering method was applied to the 
significant gene set (Fig. 2). 

The normal group consisted of 4 tissues and showed 
significantly lower expression levels when compared 
with the tumor group. In Fig. 2, we investigated the 
classification availability of the identified genes in Data 
2004, not in a combined dataset, because the two data- 
sets have different expression scales. 

Network analysis 

Based on all identified genes, new and expanded path- 



way maps and connections and specific gene-gene in- 
teractions were inferred, functionally analyzed, and used 
to build on the existing pathway using the Ingenuity 
Pathway Analysis (I PA) knowledge base [57]. 

To generate networks in this work, the knowledge 
base was queried for interactions between the identified 
genes and all other genes stored in the database. Four 
networks were found to be significant in OSCC. The 
network with the highest score (Network 1, score = 36) 
was generated, with 17 identified genes (Table 7, Fig. 
3). 

In the network diagram, STAT1 and SOD2 neighbored 
with NMI and AURKA, respectively. The expression lev- 
els of STAT1 and SOD2 could be expected to be re- 
lated with those of NMI and SOD2. Actually, the ex- 
pressions of STAT1 and SOD2 were strongly positively 
correlated with NMI (r = 0.95) and AURKA (r = 0.87), 
respectively. 

Discussion 

OSCC is associated with substantial mortality and mor- 
bidity [58]. To identify potential biomarkers for early de- 
tection of invasive OSCC, microarray experiments have 
been conducted, and these kinds of microarray datasets 
have accumulated rapidly in the public database. 
However, there are many datasets that include in- 
sufficient sample sizes for detecting significant genes by 
statistical analysis. Therefore, this study attempted to 
combine several microarray datasets from a public data- 
base to identify significant candidates as biomarkers. 

In a microarray data analysis, the information from 
different datasets obtained under different experimental 
conditions may be inconsistent even though they are 
performed with the same research objectives. Moreover, 
even when the datasets are generated by the same 
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Table 7. Four networks generated by upregulated genes in OSCC 



Network Genes Ingenuity networks 3 

1 Akt, ATR (includes EG:545), AURKA, BID, C11ORF30, CBX1, CBX3, Ck2, Cyclin A, 

Cytochrome c, EIF2AK2, ERK, GREM1, GZMK, Histone h3, Histone h4, IFI16, 
IFN TYPE 1, IFNA3, Interferon alpha, NFkB (complex), NMI, PDGF BB, PI3K, PIF, 
Proteasome, RHEB, SKP2, SMC4, SNUPN, SOD2, STAT1, Tgf beta, TOPBP1, TTK 

2 AGL, ASPN, beta-estradiol, BTG1, C1D, COX11, DDX60, DNAJB4, DSC2, DSG2, 

ECT2, FGF13, GBP1 (includes EG:2633), HNF4A, IFI44, IFI44L, IFIT5, IFNA2, 
IFNA4, IFNA6, IFNA7, IFNA5 (includes EG:3442), KCNJ2, MAPK14, MST1, MYOG, 
NUP153, PARP9, PTPRK, RCN2, SMAD3, SSTR1, TGFB1, TGTP, TMF1 

3 CAMSAP1L1, CDC25A, CDKN2A, DHFR, DISC1, DLD, DMTF1, DRAM 

(includes EG:55332), E2F4, FANCI, FEZ1, GNB2L1, GNPTAB, HMGB3, IFI202B, 
LBR, MCM3, MCM5, MELK, MKI67, MLC1, PABPC1, PAIP1, PDHB, Pias, PLSCR4, 
PRMT1, RUVBL2, SAMD4A, SLC2A4, TFDP1, TK1, TP53, TRA2B, YWHAG 

4 CAMSAP1L1, CDC25A, CDKN2A, DHFR, DISC1, DLAT, DLD, DMTF1, DRAM 

(includes EG:55332), E2F4, EIF4A, FANCI, FEZ1, GNPTAB, HMGB3, IFI202B, LBR, 
MCM3, MCM5, MELK, MKI67, MLC1, PABPC1, PAIP1, PDHB, Pias, PLSCR4, 
PRMT1, RUVBL2, SAMD4A, SLC2A4, TFDP1, TP53, TRA2B, YWHAG 



Function Score 

Cancer, cellular 36 
response to 
therapeutics, 
cell cycle 

Cell-mediated immune 28 

response, embryonic 

development, antigen 

presentation 
Cell cycle, 24 

connective tissue 

development and 

function, cell death 
Cell cycle, connective 24 

tissue development 

and function, 

lipid metabolism 



OSCC, oral squamous cell carcinoma. 

"Genes in bold were identified in this study; other genes were neither on the expression array data used in this work nor changed sig- 
nificantly; b A score > 3 was considered significant. 



CM 




Fig. 3. Network with the highest score (Network 1). Func- 
tional relationships between genes based on known inter- 
actions in Ingenuity Pathway Analysis (IPA) knowledge are 
described. 

platform, the data agreement may be affected by tech- 
nical variations between laboratories. In such cases, it 
could be necessary to use a combined dataset after ad- 
justing for the differences between such datasets for 



detecting the more reliable information. Combining data- 
sets is especially useful in OSCC microarray datasets, 
because there are many datasets with insufficient sam- 
ple sizes for analysis [4, 5, 59, 60]. 

For identifying significant genes classifying tumor and 
normal groups, we achieved two microarray datasets 
from a public database, GEO. They included 20 and 27 
samples, and each sample size was unbalanced be- 
tween the different groups. By combining these two da- 
tasets, the sample size was increased, and we had a 
sufficient sample size for statistical analysis, even 
though it was still unbalanced. When these datasets 
were combined, we used the rank of gene expression, 
because the scale of gene expression was different. In 
this study, we identified 51 significant genes from a 
combined dataset, and this number could be increased 
or decreased by the significance level (we used 0.005). 
The selected 51 genes were upregulated in tumor 
tissues. Many of the selected genes were proven to be 
cancer-related genes by previous studies. 

SOD2 is associated with lymph node metastasis in 
OSCC and may provide predictive values for the diag- 
nosis of metastasis [10]. Metastasis is a critical event in 
OSCC progression. An SOD2 variant has also been as- 
sociated with increased breast cancer and ovarian can- 
cer risk in previous studies [47, 61]. TopBPI included 
eight BRCT domains (originally identified in BRCA1), and 
it was proposed as a breast cancer susceptibility gene 
[18, 62]. 

By semiquantitative reverse transcription PCR analy- 
sis, RHEB was shown to be upregulated in OSCC [9]. 
In salivary cancer, survival probability rates dropped 



30 Genomics & Informatics Vol. 10(1) 23-32, March 2012 



when Skp2 was overexpressed [7]. Overexpression of 
Skp2 is associated with the reduction of p27 (KIP1) ex- 
pression and may have a role in the progression of 
OSCC [25]. 

The expression of RCN2 was linearly related to the 
tumor mass increase, and its expression was increased 
in breast cancer [16]. PTPRK was proven as a candi- 
date gene of colorectal cancer [19], and it is a func- 
tional tumor suppressor in Hodgkin lymphoma cells [20], 
DMTF1 was shown to be amplified in adenocarcinoma 
of the gastroesophageal junction, residing at 7q21 by 
aCGH experiments [21]. FEZ1 was involved in ovarian 
carcinogenesis, and its reduction or loss could be an 
aid to the clinical management of patients affected by 
ovarian carcinoma [22]. It is also a known tumor sup- 
pressor gene in breast cancer and gastric cancer [23, 
63]. 

Other ovarian cancer-related genes were NMI [27, 28] 
and FANCI [44]; breast cancer-related genes were COX11 
[42], MELK [33], and FANCI [44] among the selected 
genes. MELK was known to be associated with shorter 
survival in glioblastoma [34]. 

TTK was associated with progression and metastasis 
of advanced cervical cancers after radiotherapy [29, 30]. 
It might also be a relevant candidate as a new target 
in cancer therapy, since it plays relevant roles in mitotic 
progression and the spindle checkpoint [31, 32]. Aurora 
kinase A (AURKA) was associated with skin tumors [36] 
and colorectal cancer [37, 38]. 

In previous studies, OSCC-related genes among the 
selected genes were STAT1 [14], SKP2 [7, 25], IFI16 [8], 
RHEB [9], IFI44 [64], SOD2 [10-12], and GREM1 [11]. 
The gene set, which has not been proven as OSCC-re- 
lated genes until now, could be expected to be possibly 
proven as OSCC-related genes by biological evaluation. 

In this study, we identified significant genes related 
with OSCC from two microarray datasets in a public 
database. For this, we transformed microarray datasets 
using ranks of gene expressions with different expres- 
sion scales, even though they were constructed under 
the same experimental conditions. This method could be 
useful when using multiple datasets that are created for 
the same research purpose, By combining these accu- 
mulated datasets, we can detect more reliable infor- 
mation due to the increased sample size. It saves time 
and money and avoids repeating experiments. 
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